3 research outputs found

    Incremental k-Anonymous microaggregation in large-scale electronic surveys with optimized scheduling

    Get PDF
    Improvements in technology have led to enormous volumes of detailed personal information made available for any number of statistical studies. This has stimulated the need for anonymization techniques striving to attain a difficult compromise between the usefulness of the data and the protection of our privacy. k-Anonymous microaggregation permits releasing a dataset where each person remains indistinguishable from other k–1 individuals, through the aggregation of demographic attributes, otherwise a potential culprit for respondent reidentification. Although privacy guarantees are by no means absolute, the elegant simplicity of the k-anonymity criterion and the excellent preservation of information utility of microaggregation algorithms has turned them into widely popular approaches whenever data utility is critical. Unfortunately, high-utility algorithms on large datasets inherently require extensive computation. This work addresses the need of running k-anonymous microaggregation efficiently with mild distortion loss, exploiting the fact that the data may arrive over an extended period of time. Specifically, we propose to split the original dataset into two portions that will be processed subsequently, allowing the first process to start before the entire dataset is received, while leveraging the superlinearity of the microaggregation algorithms involved. A detailed mathematical formulation enables us to calculate the optimal time for the fastest anonymization, as well as for minimum distortion under a given deadline. Two incremental microaggregation algorithms are devised, for which extensive experimentation is reported. The theoretical methodology presented should prove invaluable in numerous data-collection applications, including largescale electronic surveys in which computation is possible as the data comes in.Peer ReviewedPostprint (published version

    Microagregació incremental k-anónima de grans volums de dades

    No full text
    This research proposal addresses fundamental challenges in privacy-enhancing technologies, related to the risk of statistical disclosure of sensitive information in large-scale datasets. The magnitude of its potential impact emanates from its broad applicability to information systems designed for the collection, analysis or dissemination of anonymized data in socioeconomic contexts including, but not limited to, healthcare, targeted advertising, personalized content recommendation, social networks and e-voting. Statistical disclosure control (SDC) concerns the post Every one of us is constantly releasing data about our interests, preferences or affiliations in a conscious or unconscious way. Improvements on the technological field have led to an enormous volume of data on each individual available on the Internet. Using and publishing sensitive data (e.g. social research, marketing purposes) have stimulated the need for anonymization techniques, where a compromise between usefulness of the data and privacy protection is sought. k-Anonymous micro¬aggregation permits releasing a set of data where each person cannot be distinguished from, at least, k-1 individuals while maintaining similar statistical dependence between attributes. Currently, micro¬aggregation algorithms are commonly used in this field thanks to the simplicity and quality provided. However, used on large datasets these methods result expensive in terms of computation time. This work addresses the need of running k-anonymization in a faster, efficient manner while introducing minimum distortion loss. To do so, we partition the original dataset in two fractions that will be processed in two consecutive steps enabling the possibility of starting even before receiving the entire dataset. Intuitively the process would be indicated for anonymizing surveys, electoral processes, and all manner of polls, but the method has proved to be faster even without a head start.Cada uno de nosotros está constantemente - ya sea de manera consciente o inconsciente- generando datos sobre nuestros intereses, preferencias o afiliaciones. Los avances en el campo tecnológico han llevado a que haya un enorme volumen de datos disponible en internet sobre cada individuo. El uso y publicación de datos sensibles (para investigación social o de marketing) han generado la necesidad de técnicas de anonimización, donde se busca un compromiso entre la utilidad de los datos y la protección de la privacidad. La microagregación k-anónima permite publicar un conjunto de datos donde una persona no puede ser distinguida de, al menos, k-1 individuos, mientras se mantiene una dependencia estadística similar entre atributos. En la actualidad, los algoritmos de microagregación son usados comúnmente en este campo gracias a la simplicidad y la calidad que proporcionan. Sin embargo, usados en grandes conjuntos de datos estos métodos resultan caros en términos de tiempo de proceso. Este trabajo aborda la necesidad de ejecutar la k-anonimización de una manera más rápida y eficiente introduciendo a su vez una pérdida mínima por distorsión. Para ello, partimos el conjunto de datos original en dos fracciones que serán procesadas en dos pasos consecutivos permitiendo la posibilidad de empezar incluso antes de recibir todo el conjunto de datos. Intuitivamente el proceso sería indicado para anonimizar encuestas, procesos electorales, y todo tipo de escrutinios, pero el método ha demostrado ser más rápido incluso sin ventaja en el inicio.Cadascun de nosaltres està constantment generant dades sobre els nostres interessos, preferències o afiliacions, ja sigui de manera conscient o inconscient. Els avanços en el camp tecnològic han portat a un enorme volum de dades disponibles a internet sobre cada individu. L’ús i publicació de dades sensibles (investigació social, marketing) han generat la necessitat de tècniques d’anonimització, on es busca un compromís entre la utilitat de les dades i la protecció de la privacitat. La microagregació k-anònima permet publicar un conjunt de dades on una persona no pot ser distingida de, com a mínim, k-1 individus mentre es manté una dependència estadística similar entre atributs. En l’actualitat, generalment s’utilitzen algorismes de microagregació en aquest camp a gràcies a la simplicitat i qualitat que proporcionen. No obstant això, utilitzats en grans conjunts de dades aquests mètodes resulten cars en termes de temps de procés. Aquest treball aborda la necessitat d’executar la k-anonimització d’una manera més ràpida i eficient tot presentant una pèrdua mínima per distorsió. Per a això, partim el conjunt de dades original en dues fraccions que seran processades en dues fases consecutives permetent la possibilitat de començar fins i tot abans de la recepció del conjunts de dades sencer. Intuitivament el procés seria indicat per a anonimitzar enquestes, processos electorals, i altres tipus d’escrutinis, però el mètode ha demostrat ser més ràpid fins i tot sense avantatge a l’inici

    Incremental k-Anonymous microaggregation in large-scale electronic surveys with optimized scheduling

    No full text
    Improvements in technology have led to enormous volumes of detailed personal information made available for any number of statistical studies. This has stimulated the need for anonymization techniques striving to attain a difficult compromise between the usefulness of the data and the protection of our privacy. k-Anonymous microaggregation permits releasing a dataset where each person remains indistinguishable from other k–1 individuals, through the aggregation of demographic attributes, otherwise a potential culprit for respondent reidentification. Although privacy guarantees are by no means absolute, the elegant simplicity of the k-anonymity criterion and the excellent preservation of information utility of microaggregation algorithms has turned them into widely popular approaches whenever data utility is critical. Unfortunately, high-utility algorithms on large datasets inherently require extensive computation. This work addresses the need of running k-anonymous microaggregation efficiently with mild distortion loss, exploiting the fact that the data may arrive over an extended period of time. Specifically, we propose to split the original dataset into two portions that will be processed subsequently, allowing the first process to start before the entire dataset is received, while leveraging the superlinearity of the microaggregation algorithms involved. A detailed mathematical formulation enables us to calculate the optimal time for the fastest anonymization, as well as for minimum distortion under a given deadline. Two incremental microaggregation algorithms are devised, for which extensive experimentation is reported. The theoretical methodology presented should prove invaluable in numerous data-collection applications, including largescale electronic surveys in which computation is possible as the data comes in.Peer Reviewe
    corecore